A High-Performance Syntactic and Semantic Dependency Parser

نویسندگان

  • Anders Björkelund
  • Bernd Bohnet
  • Love Hafdell
  • Pierre Nugues
چکیده

This demonstration presents a highperformance syntactic and semantic dependency parser. The system consists of a pipeline of modules that carry out the tokenization, lemmatization, part-of-speech tagging, dependency parsing, and semantic role labeling of a sentence. The system’s two main components draw on improved versions of a state-of-the-art dependency parser (Bohnet, 2009) and semantic role labeler (Björkelund et al., 2009) developed independently by the authors. The system takes a sentence as input and produces a syntactic and semantic annotation using the CoNLL 2009 format. The processing time needed for a sentence typically ranges from 10 to 1000 milliseconds. The predicate–argument structures in the final output are visualized in the form of segments, which are more intuitive for a user. 1 Motivation and Overview Semantic analyzers consist of processing pipelines to tokenize, lemmatize, tag, and parse sentences, where all the steps are crucial to their overall performance. In practice, however, while code of dependency parsers and semantic role labelers is available, few systems can be run as standalone applications and even fewer with a processing time per sentence that would allow a Authors are listed in alphabetical order. user interaction, i.e. a system response ranging from 100 to 1000 milliseconds. This demonstration is a practical semantic parser that takes an English sentence as input and produces syntactic and semantic dependency graphs using the CoNLL 2009 format. It builds on lemmatization and POS tagging preprocessing steps, as well as on two systems, one dealing with syntax and the other with semantic dependencies that reported respectively state-of-the-art results in the CoNLL 2009 shared task (Bohnet, 2009; Björkelund et al., 2009). The complete system architecture is shown in Fig. 1. The dependency parser is based on Carreras’s algorithm (Carreras, 2007) and second order spanning trees. The parser is trained with the margin infused relaxed algorithm (MIRA) (McDonald et al., 2005) and combined with a hash kernel (Shi et al., 2009). In combination with the system’s lemmatizer and POS tagger, this parser achieves an average labeled attachment score (LAS) of 89.88 when trained and tested on the English corpus of the CoNLL 2009 shared task (Surdeanu et al., 2008). The semantic role labeler (SRL) consists of a pipeline of independent, local classifiers that identify the predicates, their senses, the arguments of the predicates, and the argument labels. The SRL module achieves an average labeled semantic F1 of 80.90 when trained and tested on the English corpus of CoNLL 2009 and combined with the system’s preprocessing steps and parser.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Dependency-based Semantic Role Labeling of PropBank

We present a PropBank semantic role labeling system for English that is integrated with a dependency parser. To tackle the problem of joint syntactic–semantic analysis, the system relies on a syntactic and a semantic subcomponent. The syntactic model is a projective parser using pseudo-projective transformations, and the semantic model uses global inference mechanisms on top of a pipeline of cl...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Semantic Dependency Parsing using N-best Semantic Role Sequences and Roleset Information

In this paper, we describe a syntactic and semantic dependency parsing system submitted to the shared task of CoNLL 2008. The proposed system consists of five modules: syntactic dependency parser, predicate identifier, local semantic role labeler, global role sequence candidate generator, and role sequence selector. The syntactic dependency parser is based on Malt Parser and the sequence candid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010